{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n\nCross Compilation and RPC\n=========================\n**Author**: `Ziheng Jiang <https://github.com/ZihengJiang/>`_, `Lianmin Zheng <https://github.com/merrymercy/>`_\n\nThis tutorial introduces cross compilation and remote device\nexecution with RPC in TVM.\n\nWith cross compilation and RPC, you can **compile a program on your\nlocal machine then run it on the remote device**. It is useful when\nthe remote device resource are limited, like Raspberry Pi and mobile\nplatforms. In this tutorial, we will use the Raspberry Pi for a CPU example\nand the Firefly-RK3399 for an OpenCL example.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Build TVM Runtime on Device\n---------------------------\n\nThe first step is to build the TVM runtime on the remote device.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>All instructions in both this section and the next section should be\n  executed on the target device, e.g. Raspberry Pi.  We assume the target\n  is running Linux.</p></div>\n\nSince we do compilation on the local machine, the remote device is only used\nfor running the generated code. We only need to build the TVM runtime on\nthe remote device.\n\n.. code-block:: bash\n\n  git clone --recursive https://github.com/apache/tvm tvm\n  cd tvm\n  make runtime -j2\n\nAfter building the runtime successfully, we need to set environment variables\nin :code:`~/.bashrc` file. We can edit :code:`~/.bashrc`\nusing :code:`vi ~/.bashrc` and add the line below (Assuming your TVM\ndirectory is in :code:`~/tvm`):\n\n.. code-block:: bash\n\n  export PYTHONPATH=$PYTHONPATH:~/tvm/python\n\nTo update the environment variables, execute :code:`source ~/.bashrc`.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Set Up RPC Server on Device\n---------------------------\nTo start an RPC server, run the following command on your remote device\n(Which is Raspberry Pi in this example).\n\n  .. code-block:: bash\n\n    python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090\n\nIf you see the line below, it means the RPC server started\nsuccessfully on your device.\n\n   .. code-block:: bash\n\n     INFO:root:RPCServer: bind to 0.0.0.0:9090\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Declare and Cross Compile Kernel on Local Machine\n-------------------------------------------------\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Now we go back to the local machine, which has a full TVM installed\n  (with LLVM).</p></div>\n\nHere we will declare a simple kernel on the local machine:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\n\nimport tvm\nfrom tvm import te\nfrom tvm import rpc\nfrom tvm.contrib import utils\n\nn = tvm.runtime.convert(1024)\nA = te.placeholder((n,), name=\"A\")\nB = te.compute((n,), lambda i: A[i] + 1.0, name=\"B\")\ns = te.create_schedule(B.op)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then we cross compile the kernel.\nThe target should be 'llvm -mtriple=armv7l-linux-gnueabihf' for\nRaspberry Pi 3B, but we use 'llvm' here to make this tutorial runnable\non our webpage building server. See the detailed note in the following block.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "local_demo = True\n\nif local_demo:\n    target = \"llvm\"\nelse:\n    target = \"llvm -mtriple=armv7l-linux-gnueabihf\"\n\nfunc = tvm.build(s, [A, B], target=target, name=\"add_one\")\n# save the lib at a local temp folder\ntemp = utils.tempdir()\npath = temp.relpath(\"lib.tar\")\nfunc.export_library(path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Note</h4><p>To run this tutorial with a real remote device, change :code:`local_demo`\n  to False and replace :code:`target` in :code:`build` with the appropriate\n  target triple for your device. The target triple which might be\n  different for different devices. For example, it is\n  :code:`'llvm -mtriple=armv7l-linux-gnueabihf'` for Raspberry Pi 3B and\n  :code:`'llvm -mtriple=aarch64-linux-gnu'` for RK3399.\n\n  Usually, you can query the target by running :code:`gcc -v` on your\n  device, and looking for the line starting with :code:`Target:`\n  (Though it may still be a loose configuration.)\n\n  Besides :code:`-mtriple`, you can also set other compilation options\n  like:\n\n  * -mcpu=<cpuname>\n      Specify a specific chip in the current architecture to generate code for. By default this is inferred from the target triple and autodetected to the current architecture.\n  * -mattr=a1,+a2,-a3,...\n      Override or control specific attributes of the target, such as whether SIMD operations are enabled or not. The default set of attributes is set by the current CPU.\n      To get the list of available attributes, you can do:\n\n      .. code-block:: bash\n\n        llc -mtriple=<your device target triple> -mattr=help\n\n  These options are consistent with `llc <http://llvm.org/docs/CommandGuide/llc.html>`_.\n  It is recommended to set target triple and feature set to contain specific\n  feature available, so we can take full advantage of the features of the\n  board.\n  You can find more details about cross compilation attributes from\n  `LLVM guide of cross compilation <https://clang.llvm.org/docs/CrossCompilation.html>`_.</p></div>\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run CPU Kernel Remotely by RPC\n------------------------------\nWe show how to run the generated CPU kernel on the remote device.\nFirst we obtain an RPC session from remote device.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "if local_demo:\n    remote = rpc.LocalSession()\nelse:\n    # The following is my environment, change this to the IP address of your target device\n    host = \"10.77.1.162\"\n    port = 9090\n    remote = rpc.connect(host, port)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Upload the lib to the remote device, then invoke a device local\ncompiler to relink them. Now `func` is a remote module object.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "remote.upload(path)\nfunc = remote.load_module(\"lib.tar\")\n\n# create arrays on the remote device\ndev = remote.cpu()\na = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)\nb = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)\n# the function will run on the remote device\nfunc(a, b)\nnp.testing.assert_equal(b.numpy(), a.numpy() + 1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "When you want to evaluate the performance of the kernel on the remote\ndevice, it is important to avoid the overhead of network.\n:code:`time_evaluator` will returns a remote function that runs the\nfunction over number times, measures the cost per run on the remote\ndevice and returns the measured cost. Network overhead is excluded.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "time_f = func.time_evaluator(func.entry_name, dev, number=10)\ncost = time_f(a, b).mean\nprint(\"%g secs/op\" % cost)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run OpenCL Kernel Remotely by RPC\n---------------------------------\nFor remote OpenCL devices, the workflow is almost the same as above.\nYou can define the kernel, upload files, and run via RPC.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>Raspberry Pi does not support OpenCL, the following code is tested on\n   Firefly-RK3399. You may follow this `tutorial <https://gist.github.com/mli/585aed2cec0b5178b1a510f9f236afa2>`_\n   to setup the OS and OpenCL driver for RK3399.\n\n   Also we need to build the runtime with OpenCL enabled on rk3399 board. In the TVM\n   root directory, execute</p></div>\n\n.. code-block:: bash\n\n   cp cmake/config.cmake .\n   sed -i \"s/USE_OPENCL OFF/USE_OPENCL ON/\" config.cmake\n   make runtime -j4\n\nThe following function shows how we run an OpenCL kernel remotely\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def run_opencl():\n    # NOTE: This is the setting for my rk3399 board. You need to modify\n    # them according to your environment.\n    opencl_device_host = \"10.77.1.145\"\n    opencl_device_port = 9090\n    target = tvm.target.Target(\"opencl\", host=\"llvm -mtriple=aarch64-linux-gnu\")\n\n    # create schedule for the above \"add one\" compute declaration\n    s = te.create_schedule(B.op)\n    xo, xi = s[B].split(B.op.axis[0], factor=32)\n    s[B].bind(xo, te.thread_axis(\"blockIdx.x\"))\n    s[B].bind(xi, te.thread_axis(\"threadIdx.x\"))\n    func = tvm.build(s, [A, B], target=target)\n\n    remote = rpc.connect(opencl_device_host, opencl_device_port)\n\n    # export and upload\n    path = temp.relpath(\"lib_cl.tar\")\n    func.export_library(path)\n    remote.upload(path)\n    func = remote.load_module(\"lib_cl.tar\")\n\n    # run\n    dev = remote.cl()\n    a = tvm.nd.array(np.random.uniform(size=1024).astype(A.dtype), dev)\n    b = tvm.nd.array(np.zeros(1024, dtype=A.dtype), dev)\n    func(a, b)\n    np.testing.assert_equal(b.numpy(), a.numpy() + 1)\n    print(\"OpenCL test passed!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Summary\n-------\nThis tutorial provides a walk through of cross compilation and RPC\nfeatures in TVM.\n\n- Set up an RPC server on the remote device.\n- Set up the target device configuration to cross compile the kernels on the\n  local machine.\n- Upload and run the kernels remotely via the RPC API.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}